Segmenting documents by stylistic character
نویسندگان
چکیده
As part of a larger project to develop an aid for writers that would help to eliminate stylistic inconsistencies within a document, we experimented with neural networks to find the points in a text at which its stylistic character changes. Our best results, well above baseline, were achieved with time-delay networks that used features related to the author’s syntactic preferences, whereas low-level and vocabulary-based features were not found to be useful. An alternative approach with character bigrams was not successful.
منابع مشابه
Segmenting a document by stylistic character
As part of a larger project to develop an aid for writers that would help to eliminate stylistic inconsistencies within a document, we experimented with neural networks to find the points in a text at which its stylistic character changes. Our best results, well above baseline, were achieved with time-delay networks that used features related to the author’s syntactic preferences. Low-level and...
متن کاملA Novel Approach of Segmenting Touching and Kerned Characters
Character segmentation is a critical step of OCR system. In this paper we discussed segmentation approaches of touching and kerned characters.A non-linear segmentation pathbased algorithm for segmenting touching and kerned characters is put forward. First, touching and kerned characters are extracted and segregated with other characters by using character projections and recognition results.The...
متن کاملStyle-Directed Document Recognition
We are developing a document recognition system that can be tunably optimized for performance on documents of specific styles. We interactively generate XML to encode specific knowledge about a class of documents to be input to a recognition system. The encoding includes attributes of document logical structure as well as layout structure constraints. The encoding of document style is used to a...
متن کاملExtracting and Segmenting Container Name from Container Images
Container name extraction is very important to the modern container management system.Similar techniques have been suggested for vehicle license plate recognition in past decades.Container name extraction has more complexity from license plate extraction because of the severity of nonuniform illumination and invalidation of color information.The main purpose of this paper is to propose a new me...
متن کاملEfficient Social Network Multilingual Classification using Character, POS n-grams and Dynamic Normalization
In this paper we describe a dynamic normalization process applied to social network multilingual documents (Facebook and Twitter) to improve the performance of the Author profiling task for short texts. After the normalization process, n-grams of characters and n-grams of POS tags are obtained to extract all the possible stylistic information encoded in the documents (emoticons, character flood...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Natural Language Engineering
دوره 11 شماره
صفحات -
تاریخ انتشار 2005